2023-02-03

Hugging Face Transformers：Fine-tune

Machine Learning

NLP

Hugging Face

Python

Fine-tuning

Hugging Face Transformers provides access to thousands of pre-trained models for a wide range of tasks. When using a pre-trained model, it is trained on a dataset specific to the desired task. This is called fine tuning and is a very powerful learning technique.

Pre-trained models can be fine-tuned in the following ways:

Fine-tuning pre-trained models with Transformers Trainer
Fine-tuning pre-trained models with TensorFlow
Fine-tuning pre-trained models with native PyTorch

Prepare dataset

Before fine-tune a pre-trained model, download the dataset and prepare it for training.

Load the dataset named Yelp Reviews.

$ pip install transformers datasets

from datasets import load_dataset

dataset = load_dataset("yelp_review_full")
dataset["train"][100]

{'label': 0,
 'text': 'My expectations for McDonalds are t rarely high. But for one to still fail so spectacularly...that takes something special!\\nThe cashier took my friends\'s order, then promptly ignored me. I had to force myself in front of a cashier who opened his register to wait on the person BEHIND me. I waited over five minutes for a gigantic order that included precisely one kid\'s meal. After watching two people who ordered after me be handed their food, I asked where mine was. The manager started yelling at the cashiers for \\"serving off their orders\\" when they didn\'t have their food. But neither cashier was anywhere near those controls, and the manager was the one serving food to customers and clearing the boards.\\nThe manager was rude when giving me my order. She didn\'t make sure that I had everything ON MY RECEIPT, and never even had the decency to apologize that I felt I was getting poor service.\\nI\'ve eaten at various McDonalds restaurants for over 30 years. I\'ve worked at more than one location. I expect bad days, bad moods, and the occasional mistake. But I have yet to have a decent experience at this store. It will remain a place I avoid unless someone in my party needs to avoid illness from low blood sugar. Perhaps I should go back to the racially biased service of Steak n Shake instead!'}

Use Tokenizer to process variable length sequences. To batch process a dataset, apply a preprocessing function to the entire dataset using the map method of Datasets.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

def tokenize_function(ds):
    return tokenizer(ds["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

To save time, a small subset of the entire data set will be created for fine-tuning.

small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

Transformers Trainer

Hugging Face Transformers provides a Trainer class optimized for training models, making it easy to start training without having to manually write your own training loop. The Trainer API supports a wide range of training options and features, including logging, gradient accumulation, mixed precision, and more.

First, load the model and specify the number of labels expected; if you check the Yelp Review dataset, you will see that there are 5 labels.

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

TrainingArguments

Next, create the TrainingArguments class. This class contains all the hyperparameters that can be tuned and the flags that set the training options.

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='test_trainer',       # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
)

More information about TrainingArguments can be found at the following link.

Evaluate

Trainer does not automatically evaluate model performance during training; you must pass a function to Trainer to calculate and report metrics. The Evaluate library provides a simple precision function that can be loaded with the evaluate.load function.

import numpy as np
import evaluate

metric = evaluate.load("accuracy")

Call compute with metric to calculate the accuracy of the forecast. Before passing the forecast to compute, it must be converted to logits.

def compute_metrics(pred):
    logits, labels = pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

You can also define a compute_metrics function and pass it to Trainer using sklearn.metrics as follows.

from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

If you wish to monitor evaluation metrics during fine-tuning, specify the evaluation_strategy parameter as a learning argument to report evaluation metrics at the end of each epoch.

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir='test_trainer',       # output directory
    num_train_epochs=3,              # total # of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    evaluation_strategy="epoch"
)

Trainer

Create a Trainer object containing the model, arguments, training and test data sets, and the evaluation function.

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=small_eval_dataset,
    compute_metrics=compute_metrics,
)

Call train() to fine-tune the model.

trainer.train()

Tensorflow

It is also possible to train Hugging face Transformers models in TensorFlow using the Keras API.

If you want to train a Hugging Face Transformers model with the Keras API, you will need to convert the dataset into a format that Keras can understand. If the dataset is small, convert the entire dataset into a NumPy array and pass it to Keras.

First, load the dataset. Since this is a simple binary text classification task, we will use the GLUE benchmark CoLA dataset and just take training splits for now.

from datasets import load_dataset

dataset = load_dataset("glue", "cola")
dataset = dataset["train"]  # Just take the training split for now

Next, load the Tokenizer and tokenize the data as a NumPy array. Since the labels are already a list of 0s and 1s, we convert them directly to a NumPy array without tokenizing.

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)

labels = np.array(dataset["label"])  # Label is already an array of 0 and 1

Finally, load, compile, and fit the model.

from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam

# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5))

model.fit(tokenized_data, labels)

PyTorch

You can fine-tune the Hugging Face Transformers model in native PyTorch.

First, manually post-process the tokenized_dataset to prepare it for training.

Remove the text column since the model cannot accept raw text as input.

tokenized_datasets = tokenized_datasets.remove_columns(["text"])

Rename the label column to labels since the model expects the argument to be named labels.

tokenized_datasets = tokenized_datasets.rename_column("label", "labels")

Set the format of the dataset to return PyTorch tensors instead of lists.

tokenized_datasets.set_format("torch")

Next, create a smaller subset of the data set to save time.

small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))

Create DataLoader for training and test datasets so that you can iterate through batches of data.

from torch.utils.data import DataLoader

train_dataloader = DataLoader(small_train_dataset, shuffle=True, batch_size=8)
eval_dataloader = DataLoader(small_eval_dataset, batch_size=8)

The number of labels is read into the model.

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=5)

Create an optimizer and a learning rate scheduler for fine-tuning the model. We will use PyTorch's AdamW optimizer.

from torch.optim import AdamW

optimizer = AdamW(model.parameters(), lr=5e-5)

Create a default learning rate scheduler from Trainer.

num_epochs = 3
num_training_steps = num_epochs * len(train_dataloader)
lr_scheduler = get_scheduler(
    name="linear", optimizer=optimizer, num_warmup_steps=0, num_training_steps=num_training_steps
)

If a GPU is available, specify the device that uses the GPU.

import torch

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
model.to(device)

To track training progress, use the tqdm library to add progress bars based on the number of training steps.

from tqdm.auto import tqdm

progress_bar = tqdm(range(num_training_steps))

model.train()
for epoch in range(num_epochs):
    for batch in train_dataloader:
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(**batch)
        loss = outputs.loss
        loss.backward()

        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()
        progress_bar.update(1)

Just as you added the evaluation function to Trainer, you need to do the same thing when writing your own training loop. But instead of calculating and reporting the evaluation value at the end of each epoch, this time you will accumulate all batches with add_batch and calculate the evaluation value at the very end.

import evaluate

metric = evaluate.load("accuracy")
model.eval()
for batch in eval_dataloader:
    batch = {k: v.to(device) for k, v in batch.items()}
    with torch.no_grad():
        outputs = model(**batch)

    logits = outputs.logits
    predictions = torch.argmax(logits, dim=-1)
    metric.add_batch(predictions=predictions, references=batch["labels"])

metric.compute()

Additional resources

For more detailed tuning examples, see below.

Scripts for learning common NLP tasks in PyTorch and TensorFlow
https://huggingface.co/docs/transformers/notebooks
Various notebooks on how to fine-tune models for specific tasks in PyTorch and TensorFlow

References

Hugging Face Transformers：Tokenizer

Hugging Face Trainer Class for Efficient Transformer Training

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS