Fine-tuning
Hugging Face Transformers provides access to thousands of pre-trained models for a wide range of tasks. When using a pre-trained model, it is trained on a dataset specific to the desired task. This is called fine tuning and is a very powerful learning technique.
Pre-trained models can be fine-tuned in the following ways:
- Fine-tuning pre-trained models with Transformers Trainer
- Fine-tuning pre-trained models with TensorFlow
- Fine-tuning pre-trained models with native PyTorch
Prepare dataset
Before fine-tune a pre-trained model, download the dataset and prepare it for training.
Load the dataset named Yelp Reviews.
$ pip install transformers datasets
from datasets import load_dataset
dataset = load_dataset("yelp_review_full")
dataset["train"][100]
{'label': 0,
'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. I\'ve worked at more than one location. I expect bad days, bad moods, and the occasional mistake. But I have yet to have a decent experience at this store. It will remain a place I avoid unless someone in my party needs to avoid illness from low blood sugar. Perhaps I should go back to the racially biased service of Steak n Shake instead!'}
Use Tokenizer to process variable length sequences. To batch process a dataset, apply a preprocessing function to the entire dataset using the map
method of Datasets.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
def tokenize_function(ds):
return tokenizer(ds["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
To save time, a small subset of the entire data set will be created for fine-tuning.
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
Transformers Trainer
Hugging Face Transformers provides a Trainer
class optimized for training models, making it easy to start training without having to manually write your own training loop. The Trainer API supports a wide range of training options and features, including logging, gradient accumulation, mixed precision, and more.
First, load the model and specify the number of labels expected; if you check the Yelp Review dataset, you will see that there are 5 labels.
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
TrainingArguments
Next, create the TrainingArguments
class. This class contains all the hyperparameters that can be tuned and the flags that set the training options.
from transformers import TrainingArguments
training_args = TrainingArguments(
output_dir='test_trainer', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
)
More information about TrainingArguments
can be found at the following link.
Evaluate
Trainer does not automatically evaluate model performance during training; you must pass a function to Trainer to calculate and report metrics. The Evaluate
library provides a simple precision function that can be loaded with the evaluate.load
function.
import numpy as np
import evaluate
metric = evaluate.load("accuracy")
Call compute
with metric
to calculate the accuracy of the forecast. Before passing the forecast to compute
, it must be converted to logits
.
def compute_metrics(pred):
logits, labels = pred
predictions = np.argmax(logits, axis=-1)
return metric.compute(predictions=predictions, references=labels)
If you wish to monitor evaluation metrics during fine-tuning, specify the evaluation_strategy
parameter as a learning argument to report evaluation metrics at the end of each epoch.
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(
output_dir='test_trainer', # output directory
num_train_epochs=3, # total # of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
evaluation_strategy="epoch"
)
Trainer
Create a Trainer object containing the model, arguments, training and test data sets, and the evaluation function.
trainer = Trainer(
model=model,
args=training_args,
train_dataset=small_train_dataset,
eval_dataset=small_eval_dataset,
compute_metrics=compute_metrics,
)
Call train()
to fine-tune the model.
trainer.train()
Tensorflow
It is also possible to train Hugging face Transformers models in TensorFlow using the Keras API.
If you want to train a Hugging Face Transformers model with the Keras API, you will need to convert the dataset into a format that Keras can understand. If the dataset is small, convert the entire dataset into a NumPy array and pass it to Keras.
First, load the dataset. Since this is a simple binary text classification task, we will use the GLUE benchmark CoLA dataset and just take training splits for now.
from datasets import load_dataset
dataset = load_dataset("glue", "cola")
dataset = dataset["train"] # Just take the training split for now
Next, load the Tokenizer and tokenize the data as a NumPy array. Since the labels are already a list of 0s and 1s, we convert them directly to a NumPy array without tokenizing.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
labels = np.array(dataset["label"]) # Label is already an array of 0 and 1
Finally, load, compile, and fit the model.
from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5))
model.fit(tokenized_data, labels)
PyTorch
You can fine-tune the Hugging Face Transformers model in native PyTorch.
First, manually post-process the tokenized_dataset
to prepare it for training.
Remove the text
column since the model cannot accept raw text as input.
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
Rename the label
column to labels
since the model expects the argument to be named labels
.
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
Set the format of the dataset to return PyTorch tensors instead of lists.
tokenized_datasets.set_format("torch")
Next, create a smaller subset of the data set to save time.
small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
Create DataLoader
for training and test datasets so that you can iterate through batches of data.
from torch.utils.data import DataLoader
train_dataloader = DataLoader(small_train_dataset, shuffle=True, batch_size=8)
eval_dataloader = DataLoader(small_eval_dataset, batch_size=8)
The number of labels is read into the model.
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)
Create an optimizer and a learning rate scheduler for fine-tuning the model. We will use PyTorch's AdamW
optimizer.
from torch.optim import AdamW
optimizer = AdamW(model.parameters(), lr=5e-5)
Create a default learning rate scheduler from Trainer
.
num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(
name="linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps
)
If a GPU is available, specify the device that uses the GPU.
import torch
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)
To track training progress, use the tqdm
library to add progress bars based on the number of training steps.
from tqdm.auto import tqdm
progress_bar = tqdm(range(num_training_steps))
model.train()
for epoch in range(num_epochs):
for batch in train_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
outputs = model(**batch)
loss = outputs.loss
loss.backward()
optimizer.step()
lr_scheduler.step()
optimizer.zero_grad()
progress_bar.update(1)
Just as you added the evaluation function to Trainer, you need to do the same thing when writing your own training loop. But instead of calculating and reporting the evaluation value at the end of each epoch, this time you will accumulate all batches with add_batch
and calculate the evaluation value at the very end.
import evaluate
metric = evaluate.load("accuracy")
model.eval()
for batch in eval_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
with torch.no_grad():
outputs = model(**batch)
logits = outputs.logits
predictions = torch.argmax(logits, dim=-1)
metric.add_batch(predictions=predictions, references=batch["labels"])
metric.compute()
Additional resources
For more detailed tuning examples, see below.
- Scripts for learning common NLP tasks in PyTorch and TensorFlow
https://huggingface.co/docs/transformers/notebooks - Various notebooks on how to fine-tune models for specific tasks in PyTorch and TensorFlow
References