What is Optuna
Machine learning models have several hyperparameters, and their accuracy can vary greatly depending on the hyperparameter settings. The task of finding the optimal hyperparameters is called hyperparameter tuning. The following search algorithms have been proposed for hyperparameter tuning.
- Grid Search
- Random Search
- Bayesian Optimization
Grid Search tries all combinations of hyperparameters within a set range. Random Search tries random combinations of hyperparameters. Bayesian Optimization efficiently searches for hyperparameter combinations based on previous hyperparameter combination trials.
Optuna is a Python framework for hyperparameter tuning. It mainly uses an algorithm called TPE (Tree-structured Parzen Estimator), a type of Bayesian Optimization, to find the optimal value.
Optuna terminology
Optuna has the following terms.
- Study: a series of optimization trials
- Trial: a single trial run of the objective function
How to use Optuna
First, install Optuna.
$ pip install optuna
Optuna optimization can be performed in the following three major steps:
- Define an
objective
function that wraps the objective function - Create variables of type
Study
- Optimize with the
optimize
method
The following code will search for x
that minimizes (x - 2) ** 2
.
import optuna
# step 1
def objective(trial: optuna.Trial):
x = trial.suggest_uniform('x', -10, 10)
score = (x - 2) ** 2
print('x: %1.3f, score: %1.3f' % (x, score))
return score
# step 2
study = optuna.create_study(direction="minimize")
# step 3
study.optimize(objective, n_trials=100)
The study.best_value
contains the minimum (x - 2) ** 2
.
>> study.best_value
0.00026655993028283496
The study.best_params
contains the parameters of x
at the minimum (x - 2) ** 2
.
>> study.best_params
{'x': 2.016326663170496}
The study.best_trial
contains the trials for the minimum (x - 2) ** 2
.
>> study.best_trial
FrozenTrial(number=46, state=TrialState.COMPLETE, values=[0.00026655993028283496], datetime_start=datetime.datetime(2023, 1, 20, 11, 6, 46, 200725), datetime_complete=datetime.datetime(2023, 1, 20, 11, 6, 46, 208328), params={'x': 2.016326663170496}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=46, value=None)
The study.trials
contains the trials that were performed.
>> study.trials
[FrozenTrial(number=0, state=TrialState.COMPLETE, values=[48.70102052494164], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 39, 240177), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 39, 254344), params={'x': 8.978611647379559}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=0, value=None),
.
.
.
FrozenTrial(number=99, state=TrialState.COMPLETE, values=[1.310544492087495], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 40, 755667), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 40, 763725), params={'x': 0.8552098480125299}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=99, value=None)]
Trial settings
The settings for which parameters to optimize and how to optimize them are described below.
optimizer = trial.suggest_categorical('optimizer', ['MomentumSGD', 'Adam'])
num_layers = trial.suggest_int('num_layers', 1, 3)
dropout_rate = trial.suggest_uniform('dropout_rate', 0.0, 1.0)
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)
drop_path_rate = trial.suggest_discrete_uniform('drop_path_rate', 0.0, 1.0, 0.1)
Optuna offers the following methods for Trial.
Method | Description |
---|---|
suggest_categorical (name, choices) |
Suggest a value for the categorical parameter. |
suggest_discrete_uniform (name, low, high, q) |
Suggest a value for the discrete parameter. |
suggest_float (name, low, high, [, step, log]) |
Suggest a value for the floating point parameter. |
suggest_int (name, low, high[, step, log]) |
Suggest a value for the integer parameter. |
suggest_loguniform (name, low, high) |
Suggest a value for the continuous parameter. |
suggest_uniform (name, low, high) |
Suggest a value for the continuous parameter. |
The function arguments are as follows:
- name: the name of the hyperparameter
- low: minimum value of the parameter's range
- high: maximum value of the parameter's range
- step: the interval between possible values of the parameter
- q: the interval of discretization
- log: true if the parameter is sampled from the logarithmic domain
- choices: a list of categorical values for the parameter
Optuna convenience features
Optuna offers the following convenience features:
- Pruner
- Distributed optimization
- Dashboard functionality
Pruner
Optuna has a feature called Pruner that can automatically suspend a trial with low prospects.
study = optuna.create_study(
pruner=optuna.pruners.MedianPruner(),
)
The above code specifies a Pruner called MedianPruner()
, but other Pruners exist. Please refer to the following official document for details.
Distributed optimization
By specifying study_name
and storage
as arguments to create_study
, trial history can be shared among processes and distributed processing can be easily implemented.
study = optuna.create_study(
study_name="example-study",
storage="sqlite://example.db",
load_if_exists=True
)
By setting load_if_exists
to True
, you can also allow loading and resuming when a Study of the same name already exists in the DB.
Dashboard functionality
Optuna provides a dashboard feature that allows you to track the progress of your search.
Optimizing PyTorch model
In the following example, we optimize the validation accuracy of fashion product recognition using PyTorch and FashionMNIST.
import optuna
from optuna.trial import TrialState
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets
from torchvision import transforms
DEVICE = torch.device("cpu")
BATCHSIZE = 128
CLASSES = 10
DIR = os.getcwd()
EPOCHS = 10
N_TRAIN_EXAMPLES = BATCHSIZE * 30
N_VALID_EXAMPLES = BATCHSIZE * 10
def define_model(trial: optuna.Trial):
# We optimize the number of layers, hidden units and dropout ratio in each layer.
n_layers = trial.suggest_int("n_layers", 1, 3)
layers = []
in_features = 28 * 28
for i in range(n_layers):
out_features = trial.suggest_int("n_units_l{}".format(i), 4, 128)
layers.append(nn.Linear(in_features, out_features))
layers.append(nn.ReLU())
p = trial.suggest_float("dropout_l{}".format(i), 0.2, 0.5)
layers.append(nn.Dropout(p))
in_features = out_features
layers.append(nn.Linear(in_features, CLASSES))
layers.append(nn.LogSoftmax(dim=1))
return nn.Sequential(*layers)
def get_mnist():
# Load FashionMNIST dataset.
train_loader = torch.utils.data.DataLoader(
datasets.FashionMNIST(DIR, train=True, download=True, transform=transforms.ToTensor()),
batch_size=BATCHSIZE,
shuffle=True,
)
valid_loader = torch.utils.data.DataLoader(
datasets.FashionMNIST(DIR, train=False, transform=transforms.ToTensor()),
batch_size=BATCHSIZE,
shuffle=True,
)
return train_loader, valid_loader
def objective(trial: optuna.Trial):
# Generate the model.
model = define_model(trial).to(DEVICE)
# Generate the optimizers.
optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "RMSprop", "SGD"])
lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)
# Get the FashionMNIST dataset.
train_loader, valid_loader = get_mnist()
# Training of the model.
for epoch in range(EPOCHS):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
# Limiting training data for faster epochs.
if batch_idx * BATCHSIZE >= N_TRAIN_EXAMPLES:
break
data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
# Validation of the model.
model.eval()
correct = 0
with torch.no_grad():
for batch_idx, (data, target) in enumerate(valid_loader):
# Limiting validation data.
if batch_idx * BATCHSIZE >= N_VALID_EXAMPLES:
break
data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)
output = model(data)
# Get the index of the max log-probability.
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
accuracy = correct / min(len(valid_loader.dataset), N_VALID_EXAMPLES)
trial.report(accuracy, epoch)
# Handle pruning based on the intermediate value.
if trial.should_prune():
raise optuna.exceptions.TrialPruned()
return accuracy
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100, timeout=600)
pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])
print("Study statistics: ")
print(" Number of finished trials: ", len(study.trials))
print(" Number of pruned trials: ", len(pruned_trials))
print(" Number of complete trials: ", len(complete_trials))
print("Best trial:")
trial = study.best_trial
print(" Value: ", trial.value)
print(" Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
Study statistics:
Number of finished trials: 100
Number of pruned trials: 64
Number of complete trials: 36
Best trial:
Value: 0.8484375
Params:
n_layers: 1
n_units_l0: 77
dropout_l0: 0.2621844457931539
optimizer: Adam
lr: 0.0051477826780949205
Optimizing LightGBM
The following example optimizes the validation accuracy of cancer detection using LightGBM.
import numpy as np
import optuna
import lightgbm as lgb
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split
def objective(trial):
data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
dtrain = lgb.Dataset(train_x, label=train_y)
param = {
"objective": "binary",
"metric": "binary_logloss",
"verbosity": -1,
"boosting_type": "gbdt",
"lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
"lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
"num_leaves": trial.suggest_int("num_leaves", 2, 256),
"feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
"bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
"bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
"min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
}
gbm = lgb.train(param, dtrain)
preds = gbm.predict(valid_x)
pred_labels = np.rint(preds)
accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)
return accuracy
study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)
[I 2023-01-20 09:18:25,197] A new study created in memory with name: no-name-26038220-3fba-4ada-9237-9ad9e0a7eff4
[I 2023-01-20 09:18:25,278] Trial 0 finished with value: 0.951048951048951 and parameters: {'lambda_l1': 3.6320373475789714e-05, 'lambda_l2': 0.0001638841686303377, 'num_leaves': 52, 'feature_fraction': 0.5051855392259837, 'bagging_fraction': 0.48918754678745996, 'bagging_freq': 4, 'min_child_samples': 30}. Best is trial 0 with value: 0.951048951048951.
.
.
.
[I 2023-01-20 09:18:37,148] Trial 99 finished with value: 0.972027972027972 and parameters: {'lambda_l1': 4.921752856772178e-06, 'lambda_l2': 5.0633857392202624e-08, 'num_leaves': 28, 'feature_fraction': 0.48257699231443446, 'bagging_fraction': 0.7810382257111896, 'bagging_freq': 3, 'min_child_samples': 28}. Best is trial 36 with value: 0.993006993006993.
print(f"Number of finished trials: {len(study.trials)}")
print("Best trial:")
trial = study.best_trial
print(f" Value: {trial.value}")
print(" Params: ")
for key, value in trial.params.items():
print(f" {key}: {value}")
Number of finished trials: 100
Best trial:
Value: 0.993006993006993
Params:
lambda_l1: 2.2820624207211886e-06
lambda_l2: 4.100655307616414e-08
num_leaves: 253
feature_fraction: 0.6477416602072985
bagging_fraction: 0.7393534933706116
bagging_freq: 5
min_child_samples: 36
LightGBM Tuner
For LightGBM only, Optuna offers the LightGBM Tuner, which makes LightGBM tuning easier.
However, the LightGBM Tuner only tunes the following hyperparameters.
lambda_l1
lambda_l2
num_leaves
feature_fraction
bagging_fraction
bagging_freq
min_child_samples
import numpy as np
import optuna.integration.lightgbm as lgb
from lightgbm import early_stopping
from lightgbm import log_evaluation
import sklearn.datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
if __name__ == "__main__":
data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
train_x, val_x, train_y, val_y = train_test_split(data, target, test_size=0.25)
dtrain = lgb.Dataset(train_x, label=train_y)
dval = lgb.Dataset(val_x, label=val_y)
params = {
"objective": "binary",
"metric": "binary_logloss",
"verbosity": -1,
"boosting_type": "gbdt",
}
model = lgb.train(
params,
dtrain,
valid_sets=[dtrain, dval],
callbacks=[early_stopping(100), log_evaluation(100)],
)
prediction = np.rint(model.predict(val_x, num_iteration=model.best_iteration))
accuracy = accuracy_score(val_y, prediction)
best_params = model.params
print("Best params:", best_params)
print(" Accuracy = {}".format(accuracy))
print(" Params: ")
for key, value in best_params.items():
print(" {}: {}".format(key, value))
Best params: {'objective': 'binary', 'metric': 'binary_logloss', 'verbosity': -1, 'boosting_type': 'gbdt', 'feature_pre_filter': False, 'lambda_l1': 3.9283033758323693e-07, 'lambda_l2': 0.11914982777201996, 'num_leaves': 4, 'feature_fraction': 0.4, 'bagging_fraction': 0.46448877892449625, 'bagging_freq': 3, 'min_child_samples': 20}
Accuracy = 0.9790209790209791
Params:
objective: binary
metric: binary_logloss
verbosity: -1
boosting_type: gbdt
feature_pre_filter: False
lambda_l1: 3.9283033758323693e-07
lambda_l2: 0.11914982777201996
num_leaves: 4
feature_fraction: 0.4
bagging_fraction: 0.46448877892449625
bagging_freq: 3
min_child_samples: 20
Other optimization examples
The following GitHub shows a wealth of examples of Optuna implementations in other models.
References