2023-01-20

Optuna

What is Optuna

Machine learning models have several hyperparameters, and their accuracy can vary greatly depending on the hyperparameter settings. The task of finding the optimal hyperparameters is called hyperparameter tuning. The following search algorithms have been proposed for hyperparameter tuning.

Grid Search
Random Search
Bayesian Optimization

Grid Search tries all combinations of hyperparameters within a set range. Random Search tries random combinations of hyperparameters. Bayesian Optimization efficiently searches for hyperparameter combinations based on previous hyperparameter combination trials.

Optuna is a Python framework for hyperparameter tuning. It mainly uses an algorithm called TPE (Tree-structured Parzen Estimator), a type of Bayesian Optimization, to find the optimal value.

Optuna terminology

Optuna has the following terms.

Study: a series of optimization trials
Trial: a single trial run of the objective function

How to use Optuna

First, install Optuna.

$ pip install optuna

Optuna optimization can be performed in the following three major steps:

Define an objective function that wraps the objective function
Create variables of type Study
Optimize with the optimize method

The following code will search for x that minimizes (x - 2) ** 2.

import optuna

# step 1
def objective(trial: optuna.Trial):
    x = trial.suggest_uniform('x', -10, 10)
    score = (x - 2) ** 2
    print('x: %1.3f, score: %1.3f' % (x, score))
    return score

# step 2
study = optuna.create_study(direction="minimize")

# step 3
study.optimize(objective, n_trials=100)

The study.best_value contains the minimum (x - 2) ** 2.

>> study.best_value

0.00026655993028283496

The study.best_params contains the parameters of x at the minimum (x - 2) ** 2.

>> study.best_params

{'x': 2.016326663170496}

The study.best_trial contains the trials for the minimum (x - 2) ** 2.

>> study.best_trial

FrozenTrial(number=46, state=TrialState.COMPLETE, values=[0.00026655993028283496], datetime_start=datetime.datetime(2023, 1, 20, 11, 6, 46, 200725), datetime_complete=datetime.datetime(2023, 1, 20, 11, 6, 46, 208328), params={'x': 2.016326663170496}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=46, value=None)

The study.trials contains the trials that were performed.

>> study.trials

[FrozenTrial(number=0, state=TrialState.COMPLETE, values=[48.70102052494164], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 39, 240177), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 39, 254344), params={'x': 8.978611647379559}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=0, value=None),
.
.
.
 FrozenTrial(number=99, state=TrialState.COMPLETE, values=[1.310544492087495], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 40, 755667), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 40, 763725), params={'x': 0.8552098480125299}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=99, value=None)]

Trial settings

The settings for which parameters to optimize and how to optimize them are described below.

optimizer = trial.suggest_categorical('optimizer', ['MomentumSGD', 'Adam'])
num_layers = trial.suggest_int('num_layers', 1, 3)
dropout_rate = trial.suggest_uniform('dropout_rate', 0.0, 1.0)
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)
drop_path_rate = trial.suggest_discrete_uniform('drop_path_rate', 0.0, 1.0, 0.1)

Optuna offers the following methods for Trial.

Method	Description
`suggest_categorical`(name, choices)	Suggest a value for the categorical parameter.
`suggest_discrete_uniform`(name, low, high, q)	Suggest a value for the discrete parameter.
`suggest_float`(name, low, high, [, step, log])	Suggest a value for the floating point parameter.
`suggest_int`(name, low, high[, step, log])	Suggest a value for the integer parameter.
`suggest_loguniform`(name, low, high)	Suggest a value for the continuous parameter.
`suggest_uniform`(name, low, high)	Suggest a value for the continuous parameter.

The function arguments are as follows:

name: the name of the hyperparameter
low: minimum value of the parameter's range
high: maximum value of the parameter's range
step: the interval between possible values of the parameter
q: the interval of discretization
log: true if the parameter is sampled from the logarithmic domain
choices: a list of categorical values for the parameter

Optuna convenience features

Optuna offers the following convenience features:

Pruner
Distributed optimization
Dashboard functionality

Pruner

Optuna has a feature called Pruner that can automatically suspend a trial with low prospects.

study = optuna.create_study(
    pruner=optuna.pruners.MedianPruner(),
)

The above code specifies a Pruner called MedianPruner(), but other Pruners exist. Please refer to the following official document for details.

Distributed optimization

By specifying study_name and storage as arguments to create_study, trial history can be shared among processes and distributed processing can be easily implemented.

Optuna

What is Optuna

Optuna terminology

How to use Optuna

Trial settings

Optuna convenience features

Pruner

Distributed optimization

Dashboard functionality

Optimizing PyTorch model

Optimizing LightGBM

LightGBM Tuner

Other optimization examples

References

Kedro CLI

Optuna + MLflow

Ryusei Kakujo