2023-01-20

Optuna

Optuna とは

機械学習のモデルには、いくつかのハイパーパラメータがあり、そのハイパーパラメータの設定次第で精度が大きく変化します。最適なハイパーパラメータを探すタスクをハイパーパラメータチューニングと言います。ハイパーパラメータチューニングには、次のようなハイパーパラメータチューニング用の探索アルゴリズムが提案されています。

グリッドサーチ（Grid Search）
ランダムサーチ（Random Search）
ベイズ最適化（Bayesian Optimization）

グリッドサーチでは、設定した範囲内でハイパーパラメータの全組み合わせを試行します。ランダムサーチでは、ランダムにハイパーパラメータを組み合わせて試行します。ベイズ最適化では前回のハイパーパラメータの組み合わせによる試行を元にに効率的にハイパーパラメータの組み合わせを探索していきます。

OptunaとはハイパーパラメータチューニングのPythonフレームワークです。主にベイズ最適化の一種であるTPE（Tree-structured Parzen Estimator）と呼ばれるアルゴリズムを使って最適な値の探索してくれます。

Optuna の用語

Optunaには次のような用語があります。

Study: 最適化の一連の試行
Trial: 目的関数の実行1回分の試行

Optuna の使い方

まずはOptunaをインストールします。

$ pip install optuna

大きく次の3ステップでOptunaによる最適化を行うことができます。

目的関数をラップするobjective関数を定義する
Study型の変数を生成する
optimizeメソッドで最適化する

以下は、(x - 2) ** 2を最小にするxを探索するコードになります。

import optuna

# step 1
def objective(trial: optuna.Trial):
    x = trial.suggest_uniform('x', -10, 10)
    score = (x - 2) ** 2
    print('x: %1.3f, score: %1.3f' % (x, score))
    return score

# step 2
study = optuna.create_study(direction="minimize")

# step 3
study.optimize(objective, n_trials=100)

study.best_valueには最小となる(x - 2) ** 2が格納されています。

>> study.best_value

0.00026655993028283496

study.best_paramsには最小となる(x - 2) ** 2のときのxのパラメータ、つまりxが格納されています。

>> study.best_params

{'x': 2.016326663170496}

study.best_trialには最小となる(x - 2) ** 2のときの試行の内容が格納されています。

>> study.best_trial

FrozenTrial(number=46, state=TrialState.COMPLETE, values=[0.00026655993028283496], datetime_start=datetime.datetime(2023, 1, 20, 11, 6, 46, 200725), datetime_complete=datetime.datetime(2023, 1, 20, 11, 6, 46, 208328), params={'x': 2.016326663170496}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=46, value=None)

study.trialsには行われた試行の内容が格納されています。

>> study.trials

[FrozenTrial(number=0, state=TrialState.COMPLETE, values=[48.70102052494164], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 39, 240177), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 39, 254344), params={'x': 8.978611647379559}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=0, value=None),
.
.
.
 FrozenTrial(number=99, state=TrialState.COMPLETE, values=[1.310544492087495], datetime_start=datetime.datetime(2023, 1, 20, 6, 4, 40, 755667), datetime_complete=datetime.datetime(2023, 1, 20, 6, 4, 40, 763725), params={'x': 0.8552098480125299}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'x': FloatDistribution(high=10.0, log=False, low=-10.0, step=None)}, trial_id=99, value=None)]

Trial の設定

どのパラメータをどのように最適化するかの設定は次のように記述します。

optimizer = trial.suggest_categorical('optimizer', ['MomentumSGD', 'Adam'])
num_layers = trial.suggest_int('num_layers', 1, 3)
dropout_rate = trial.suggest_uniform('dropout_rate', 0.0, 1.0)
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-2)
drop_path_rate = trial.suggest_discrete_uniform('drop_path_rate', 0.0, 1.0, 0.1)

Optunaが提供しているTrialのメソッドは次のとおりです。

メソッド	説明
`suggest_categorical`(name, choices)	Suggest a value for the categorical parameter.
`suggest_discrete_uniform`(name, low, high, q)	Suggest a value for the discrete parameter.
`suggest_float`(name, low, high, [, step, log])	Suggest a value for the floating point parameter.
`suggest_int`(name, low, high[, step, log])	Suggest a value for the integer parameter.
`suggest_loguniform`(name, low, high)	Suggest a value for the continuous parameter.
`suggest_uniform`(name, low, high)	Suggest a value for the continuous parameter.

関数の引数の内容は以下になります。

name: ハイパーパラメータの名前
low: パラメータが取り得る範囲の最小値
high: パラメータが取り得る範囲の最大値
step: パラメータが取り得る値の間隔
q: 離散化の間隔
log: パラメータを対数の定義域からサンプリングする場合はTrue
choices: パラメータのカテゴリ値のリスト

Optuna の便利な機能

Optunaには次の便利な機能があります。

Pruner
分散最適化
ダッシュボード機能

Pruner

OptunaにはPrunerという見込みの薄いトライアルを自動で中断することができる機能があります。Prunerは次のように記述します。

study = optuna.create_study(
    pruner=optuna.pruners.MedianPruner(),
)

上記のコードではMedianPruner()というPrunerを指定していますが、他のPrunerも存在します。詳細は次の公式ドキュメントをご参照ください。

分散最適化

create_studyの引数にstudy_nameとstorageを指定することで、トライアル履歴をプロセス間で共有できるようになり、分散処理の実装も容易になります。ストレージを特に指定しない場合は最適化の状況はメモリ内に保持されるため、実行の度にまっさらな状態から探索が始まります。

study = optuna.create_study(
  study_name="example-study",
  storage="sqlite://example.db",
  load_if_exists=True
)

load_if_existsをTrueにすることで、すでにDBに同名のStudyが存在する場合の読み込みと再開を許可することもできます。

ダッシュボード機能

Optunaはダッシュボード機能を提供しており、探索の進行状況を確認することができます。

PyTorch モデルの最適化

次の例では、PyTorchとFashionMNISTを用いたファッション商品認識の検証精度を最適化しています。

import optuna
from optuna.trial import TrialState
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.utils.data
from torchvision import datasets
from torchvision import transforms


DEVICE = torch.device("cpu")
BATCHSIZE = 128
CLASSES = 10
DIR = os.getcwd()
EPOCHS = 10
N_TRAIN_EXAMPLES = BATCHSIZE * 30
N_VALID_EXAMPLES = BATCHSIZE * 10


def define_model(trial: optuna.Trial):
    # We optimize the number of layers, hidden units and dropout ratio in each layer.
    n_layers = trial.suggest_int("n_layers", 1, 3)
    layers = []

    in_features = 28 * 28
    for i in range(n_layers):
        out_features = trial.suggest_int("n_units_l{}".format(i), 4, 128)
        layers.append(nn.Linear(in_features, out_features))
        layers.append(nn.ReLU())
        p = trial.suggest_float("dropout_l{}".format(i), 0.2, 0.5)
        layers.append(nn.Dropout(p))

        in_features = out_features
    layers.append(nn.Linear(in_features, CLASSES))
    layers.append(nn.LogSoftmax(dim=1))

    return nn.Sequential(*layers)


def get_mnist():
    # Load FashionMNIST dataset.
    train_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(DIR, train=True, download=True, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )
    valid_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST(DIR, train=False, transform=transforms.ToTensor()),
        batch_size=BATCHSIZE,
        shuffle=True,
    )

    return train_loader, valid_loader


def objective(trial: optuna.Trial):

    # Generate the model.
    model = define_model(trial).to(DEVICE)

    # Generate the optimizers.
    optimizer_name = trial.suggest_categorical("optimizer", ["Adam", "RMSprop", "SGD"])
    lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
    optimizer = getattr(optim, optimizer_name)(model.parameters(), lr=lr)

    # Get the FashionMNIST dataset.
    train_loader, valid_loader = get_mnist()

    # Training of the model.
    for epoch in range(EPOCHS):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            # Limiting training data for faster epochs.
            if batch_idx * BATCHSIZE >= N_TRAIN_EXAMPLES:
                break

            data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)

            optimizer.zero_grad()
            output = model(data)
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()

        # Validation of the model.
        model.eval()
        correct = 0
        with torch.no_grad():
            for batch_idx, (data, target) in enumerate(valid_loader):
                # Limiting validation data.
                if batch_idx * BATCHSIZE >= N_VALID_EXAMPLES:
                    break
                data, target = data.view(data.size(0), -1).to(DEVICE), target.to(DEVICE)
                output = model(data)
                # Get the index of the max log-probability.
                pred = output.argmax(dim=1, keepdim=True)
                correct += pred.eq(target.view_as(pred)).sum().item()

        accuracy = correct / min(len(valid_loader.dataset), N_VALID_EXAMPLES)

        trial.report(accuracy, epoch)

        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.exceptions.TrialPruned()

    return accuracy

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100, timeout=600)

pruned_trials = study.get_trials(deepcopy=False, states=[TrialState.PRUNED])
complete_trials = study.get_trials(deepcopy=False, states=[TrialState.COMPLETE])

print("Study statistics: ")
print("  Number of finished trials: ", len(study.trials))
print("  Number of pruned trials: ", len(pruned_trials))
print("  Number of complete trials: ", len(complete_trials))

print("Best trial:")
trial = study.best_trial

print("  Value: ", trial.value)

print("  Params: ")
for key, value in trial.params.items():
    print("    {}: {}".format(key, value))

Study statistics:
  Number of finished trials:  100
  Number of pruned trials:  64
  Number of complete trials:  36
Best trial:
  Value:  0.8484375
  Params:
    n_layers: 1
    n_units_l0: 77
    dropout_l0: 0.2621844457931539
    optimizer: Adam
    lr: 0.0051477826780949205

LightGBM の最適化

次の例では、LightGBMを用いた癌検出の検証精度を最適化しています。

import numpy as np
import optuna

import lightgbm as lgb
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split

def objective(trial):
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)

    param = {
        "objective": "binary",
        "metric": "binary_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
        "lambda_l1": trial.suggest_float("lambda_l1", 1e-8, 10.0, log=True),
        "lambda_l2": trial.suggest_float("lambda_l2", 1e-8, 10.0, log=True),
        "num_leaves": trial.suggest_int("num_leaves", 2, 256),
        "feature_fraction": trial.suggest_float("feature_fraction", 0.4, 1.0),
        "bagging_fraction": trial.suggest_float("bagging_fraction", 0.4, 1.0),
        "bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
        "min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
    }

    gbm = lgb.train(param, dtrain)
    preds = gbm.predict(valid_x)
    pred_labels = np.rint(preds)
    accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)
    return accuracy

study = optuna.create_study(direction="maximize")
study.optimize(objective, n_trials=100)

[I 2023-01-20 09:18:25,197] A new study created in memory with name: no-name-26038220-3fba-4ada-9237-9ad9e0a7eff4
[I 2023-01-20 09:18:25,278] Trial 0 finished with value: 0.951048951048951 and parameters: {'lambda_l1': 3.6320373475789714e-05, 'lambda_l2': 0.0001638841686303377, 'num_leaves': 52, 'feature_fraction': 0.5051855392259837, 'bagging_fraction': 0.48918754678745996, 'bagging_freq': 4, 'min_child_samples': 30}. Best is trial 0 with value: 0.951048951048951.
.
.
.
[I 2023-01-20 09:18:37,148] Trial 99 finished with value: 0.972027972027972 and parameters: {'lambda_l1': 4.921752856772178e-06, 'lambda_l2': 5.0633857392202624e-08, 'num_leaves': 28, 'feature_fraction': 0.48257699231443446, 'bagging_fraction': 0.7810382257111896, 'bagging_freq': 3, 'min_child_samples': 28}. Best is trial 36 with value: 0.993006993006993.

print(f"Number of finished trials: {len(study.trials)}")
print("Best trial:")

trial = study.best_trial

print(f"  Value: {trial.value}")

print("  Params: ")
for key, value in trial.params.items():
    print(f"    {key}: {value}")

Number of finished trials: 100
Best trial:
  Value: 0.993006993006993
  Params:
    lambda_l1: 2.2820624207211886e-06
    lambda_l2: 4.100655307616414e-08
    num_leaves: 253
    feature_fraction: 0.6477416602072985
    bagging_fraction: 0.7393534933706116
    bagging_freq: 5
    min_child_samples: 36

LightGBM Tuner

LightGBMに限り、OptunaはLightGBM Tunerというものを提供しています。LightGBM Tunerを使うとより簡単にLightGBMのチューニングをすることができます。

ただし、LightGBM Tunerは次のハイパーパラメータのみをチューニングの対象としています。

lambda_l1
lambda_l2
num_leaves
feature_fraction
bagging_fraction
bagging_freq
min_child_samples

import numpy as np
import optuna.integration.lightgbm as lgb

from lightgbm import early_stopping
from lightgbm import log_evaluation
import sklearn.datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split


if __name__ == "__main__":
    data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
    train_x, val_x, train_y, val_y = train_test_split(data, target, test_size=0.25)
    dtrain = lgb.Dataset(train_x, label=train_y)
    dval = lgb.Dataset(val_x, label=val_y)

    params = {
        "objective": "binary",
        "metric": "binary_logloss",
        "verbosity": -1,
        "boosting_type": "gbdt",
    }

    model = lgb.train(
        params,
        dtrain,
        valid_sets=[dtrain, dval],
        callbacks=[early_stopping(100), log_evaluation(100)],
    )

    prediction = np.rint(model.predict(val_x, num_iteration=model.best_iteration))
    accuracy = accuracy_score(val_y, prediction)

    best_params = model.params
    print("Best params:", best_params)
    print("  Accuracy = {}".format(accuracy))
    print("  Params: ")
    for key, value in best_params.items():
        print("    {}: {}".format(key, value))

Best params: {'objective': 'binary', 'metric': 'binary_logloss', 'verbosity': -1, 'boosting_type': 'gbdt', 'feature_pre_filter': False, 'lambda_l1': 3.9283033758323693e-07, 'lambda_l2': 0.11914982777201996, 'num_leaves': 4, 'feature_fraction': 0.4, 'bagging_fraction': 0.46448877892449625, 'bagging_freq': 3, 'min_child_samples': 20}
  Accuracy = 0.9790209790209791
  Params:
    objective: binary
    metric: binary_logloss
    verbosity: -1
    boosting_type: gbdt
    feature_pre_filter: False
    lambda_l1: 3.9283033758323693e-07
    lambda_l2: 0.11914982777201996
    num_leaves: 4
    feature_fraction: 0.4
    bagging_fraction: 0.46448877892449625
    bagging_freq: 3
    min_child_samples: 20

その他の最適化の例

次のGitHubに他のモデルでのOptunaの実装例が豊富に示されています。

参考

Kedro CLI

Optuna + MLflow

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS