2023-01-20

Optuna

What is Optuna

Machine learning models have several hyperparameters, and their accuracy can vary greatly depending on the hyperparameter settings. The task of finding the optimal hyperparameters is called hyperparameter tuning. The following search algorithms have been proposed for hyperparameter tuning.

  • Grid Search
  • Random Search
  • Bayesian Optimization

Grid Search tries all combinations of hyperparameters within a set range. Random Search tries random combinations of hyperparameters. Bayesian Optimization efficiently searches for hyperparameter combinations based on previous hyperparameter combination trials.

Optuna is a Python framework for hyperparameter tuning. It mainly uses an algorithm called TPE (Tree-structured Parzen Estimator), a type of Bayesian Optimization, to find the optimal value.

Optuna terminology

Optuna has the following terms.

  • Study: a series of optimization trials
  • Trial: a single trial run of the objective function

How to use Optuna

First, install Optuna.

$ pip install optuna

Optuna optimization can be performed in the following three major steps:

  1. Define an objective function that wraps the objective function
  2. Create variables of type Study
  3. Optimize with the optimize method

The following code will search for x that minimizes (x - 2) ** 2.

import optuna

# step 1
def objective(trial: optuna.Trial):
    x = trial.suggest_uniform('x', -10, 10)
    score = (x - 2) ** 2
    print('x: %1.3f, score: %1.3f' % (x, score))
    return score

# step 2
study = optuna.create_study(direction="minimize")

# step 3
study.optimize(objective, n_trials=100)

The study.best_value contains the minimum (x - 2) ** 2.

>> study.best_value

0.00026655993028283496

The study.best_params contains the parameters of x at the minimum (x - 2) ** 2.

>> study.best_params

{'x': 2.016326663170496}

The study.best_trial contains the trials for the minimum (x - 2) ** 2.

>> study.best_trial

FrozenTrial(number=46, state=TrialState.COMPLETE, values=[0.00026655993028283496], datetime_start=datetime.datetime(2023, 1, 20, 11, 6, 46, 200725), datetime_complete=datetime.datetime(2023, 1, 20, 11, 6, 46, 208328), params={'x': 2.016326663170496}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=46, value=None)

The study.trials contains the trials that were performed.

>> study.trials

[FrozenTrial(number=0, state=TrialState.COMPLETE, values=[48.70102052494164], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 39, 240177), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 39, 254344), params={'x': 8.978611647379559}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=0, value=None),
.
.
.
 FrozenTrial(number=99, state=TrialState.COMPLETE, values=[1.310544492087495], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 40, 755667), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 40, 763725), params={'x': 0.8552098480125299}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=99, value=None)]

Trial settings

The settings for which parameters to optimize and how to optimize them are described below.

optimizer = trial.suggest_categorical('optimizer', ['MomentumSGD', 'Adam'])
num_layers = trial.suggest_int('num_layers', 1, 3)
dropout_rate = trial.suggest_uniform('dropout_rate', 0.0, 1.0)
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)
drop_path_rate = trial.suggest_discrete_uniform('drop_path_rate', 0.0, 1.0, 0.1)

Optuna offers the following methods for Trial.

Method Description
suggest_categorical(name, choices) Suggest a value for the categorical parameter.
suggest_discrete_uniform(name, low, high, q) Suggest a value for the discrete parameter.
suggest_float(name, low, high, [, step, log]) Suggest a value for the floating point parameter.
suggest_int(name, low, high[, step, log]) Suggest a value for the integer parameter.
suggest_loguniform(name, low, high) Suggest a value for the continuous parameter.
suggest_uniform(name, low, high) Suggest a value for the continuous parameter.

The function arguments are as follows:

  • name: the name of the hyperparameter
  • low: minimum value of the parameter's range
  • high: maximum value of the parameter's range
  • step: the interval between possible values of the parameter
  • q: the interval of discretization
  • log: true if the parameter is sampled from the logarithmic domain
  • choices: a list of categorical values for the parameter

Optuna convenience features

Optuna offers the following convenience features:

  • Pruner
  • Distributed optimization
  • Dashboard functionality

Pruner

Optuna has a feature called Pruner that can automatically suspend a trial with low prospects.

study = optuna.create_study(
    pruner=optuna.pruners.MedianPruner(),
)

The above code specifies a Pruner called MedianPruner(), but other Pruners exist. Please refer to the following official document for details.

https://optuna.readthedocs.io/en/stable/reference/pruners.html

Distributed optimization

By specifying study_name and storage as arguments to create_study, trial history can be shared among processes and distributed processing can be easily implemented.

study = optuna.create_study(
  study_name="example-study",
  storage="sqlite://example.db",
  load_if_exists=True
)

By setting load_if_exists to True, you can also allow loading and resuming when a Study of the same name already exists in the DB.

Dashboard functionality

Optuna provides a dashboard feature that allows you to track the progress of your search.

https://github.com/optuna/optuna-dashboard

Optimizing PyTorch model

In the following example, we optimize the validation accuracy of fashion product recognition using PyTorch and FashionMNIST.

import optuna
from optuna.trial import TrialState
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets
from torchvision import transforms


DEVICE = torch.device("cpu")
BATCHSIZE = 128
CLASSES = 10
DIR = os.getcwd()
EPOCHS = 10
N_TRAIN_EXAMPLES = BATCHSIZE * 30
N_VALID_EXAMPLES = BATCHSIZE * 10


def define_model(trial: optuna.Trial):
    # We optimize the number of layers, hidden units and dropout ratio in each layer.
    n_layers = trial.suggest_int("n_layers", 1, 3)
    layers = []

    in_features = 28 * 28
    for i in range(n_layers):
        out_features = trial.suggest_int("n_units_l{}".format(i), 4, 128)
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())
        p = trial.suggest_float("dropout_l{}".format(i), 0.2, 0.5)
        layers.append(nn.Dropout(p))

        in_features = out_features
    layers.append(nn.Linear(in_features, CLASSES))
    layers.append(nn.LogSoftmax(dim=1))

    return nn.Sequential(*layers)


def get_mnist():
    # Load FashionMNIST dataset.
    train_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(DIR, train=True, download=True, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )
    valid_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(DIR, train=False, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )

    return train_loader, valid_loader


def objective(trial: optuna.Trial):

    # Generate the model.
    model = define_model(trial).to(DEVICE)

    # Generate the optimizers.
    optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "RMSprop", "SGD"])
    lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)

    # Get the FashionMNIST dataset.
    train_loader, valid_loader = get_mnist()

    # Training of the model.
    for epoch in range(EPOCHS):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            # Limiting training data for faster epochs.
            if batch_idx * BATCHSIZE >= N_TRAIN_EXAMPLES:
                break

            data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)

            optimizer.zero_grad()
            output = model(data)
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()

        # Validation of the model.
        model.eval()
        correct = 0
        with torch.no_grad():
            for batch_idx, (data, target) in enumerate(valid_loader):
                # Limiting validation data.
                if batch_idx * BATCHSIZE >= N_VALID_EXAMPLES:
                    break
                data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)
                output = model(data)
                # Get the index of the max log-probability.
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()

        accuracy = correct / min(len(valid_loader.dataset), N_VALID_EXAMPLES)

        trial.report(accuracy, epoch)

        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

    return accuracy
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100, timeout=600)

pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])

print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))
Study statistics:
  Number of finished trials:  100
  Number of pruned trials:  64
  Number of complete trials:  36
Best trial:
  Value:  0.8484375
  Params:
    n_layers: 1
    n_units_l0: 77
    dropout_l0: 0.2621844457931539
    optimizer: Adam
    lr: 0.0051477826780949205

Optimizing LightGBM

The following example optimizes the validation accuracy of cancer detection using LightGBM.

import numpy as np
import optuna

import lightgbm as lgb
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split

def objective(trial):
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        "objective": "binary",
        "metric": "binary_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
        "num_leaves": trial.suggest_int("num_leaves", 2, 256),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
    }

    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(valid_x)
    pred_labels = np.rint(preds)
    accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)
    return accuracy

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
[I 2023-01-20 09:18:25,197] A new study created in memory with name: no-name-26038220-3fba-4ada-9237-9ad9e0a7eff4
[I 2023-01-20 09:18:25,278] Trial 0 finished with value: 0.951048951048951 and parameters: {'lambda_l1': 3.6320373475789714e-05, 'lambda_l2': 0.0001638841686303377, 'num_leaves': 52, 'feature_fraction': 0.5051855392259837, 'bagging_fraction': 0.48918754678745996, 'bagging_freq': 4, 'min_child_samples': 30}. Best is trial 0 with value: 0.951048951048951.
.
.
.
[I 2023-01-20 09:18:37,148] Trial 99 finished with value: 0.972027972027972 and parameters: {'lambda_l1': 4.921752856772178e-06, 'lambda_l2': 5.0633857392202624e-08, 'num_leaves': 28, 'feature_fraction': 0.48257699231443446, 'bagging_fraction': 0.7810382257111896, 'bagging_freq': 3, 'min_child_samples': 28}. Best is trial 36 with value: 0.993006993006993.
print(f"Number of finished trials: {len(study.trials)}")
print("Best trial:")

trial = study.best_trial

print(f"  Value: {trial.value}")

print("  Params: ")
for key, value in trial.params.items():
    print(f"    {key}: {value}")

Number of finished trials: 100
Best trial:
  Value: 0.993006993006993
  Params:
    lambda_l1: 2.2820624207211886e-06
    lambda_l2: 4.100655307616414e-08
    num_leaves: 253
    feature_fraction: 0.6477416602072985
    bagging_fraction: 0.7393534933706116
    bagging_freq: 5
    min_child_samples: 36

LightGBM Tuner

For LightGBM only, Optuna offers the LightGBM Tuner, which makes LightGBM tuning easier.

However, the LightGBM Tuner only tunes the following hyperparameters.

  • lambda_l1
  • lambda_l2
  • num_leaves
  • feature_fraction
  • bagging_fraction
  • bagging_freq
  • min_child_samples
import numpy as np
import optuna.integration.lightgbm as lgb

from lightgbm import early_stopping
from lightgbm import log_evaluation
import sklearn.datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split


if __name__ == "__main__":
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, val_x, train_y, val_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)
    dval = lgb.Dataset(val_x, label=val_y)

    params = {
        "objective": "binary",
        "metric": "binary_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
    }

    model = lgb.train(
        params,
        dtrain,
        valid_sets=[dtrain, dval],
        callbacks=[early_stopping(100), log_evaluation(100)],
    )

    prediction = np.rint(model.predict(val_x, num_iteration=model.best_iteration))
    accuracy = accuracy_score(val_y, prediction)

    best_params = model.params
    print("Best params:", best_params)
    print("  Accuracy = {}".format(accuracy))
    print("  Params: ")
    for key, value in best_params.items():
        print("    {}: {}".format(key, value))
Best params: {'objective': 'binary', 'metric': 'binary_logloss', 'verbosity': -1, 'boosting_type': 'gbdt', 'feature_pre_filter': False, 'lambda_l1': 3.9283033758323693e-07, 'lambda_l2': 0.11914982777201996, 'num_leaves': 4, 'feature_fraction': 0.4, 'bagging_fraction': 0.46448877892449625, 'bagging_freq': 3, 'min_child_samples': 20}
  Accuracy = 0.9790209790209791
  Params:
    objective: binary
    metric: binary_logloss
    verbosity: -1
    boosting_type: gbdt
    feature_pre_filter: False
    lambda_l1: 3.9283033758323693e-07
    lambda_l2: 0.11914982777201996
    num_leaves: 4
    feature_fraction: 0.4
    bagging_fraction: 0.46448877892449625
    bagging_freq: 3
    min_child_samples: 20

Other optimization examples

The following GitHub shows a wealth of examples of Optuna implementations in other models.

https://github.com/optuna/optuna-examples

References

https://optuna.readthedocs.io/en/stable/index.html
https://optuna.readthedocs.io/en/stable/reference/generated/optuna.integration.lightgbm.LightGBMTuner.html
https://optuna.readthedocs.io/en/stable/reference/generated/optuna.trial.Trial.html#optuna.trial.Trial
https://github.com/optuna/optuna-examples
https://github.com/optuna/optuna-dashboard

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!