2023-01-06

MLflow Tracking

MLOps

MLflow

What is MLflow

MLflow is an OSS tool for managing ML lifecycles. Mlflow offers the following functions:

MLFlow Tracking: experiment management
MLFlow Projects: management of runtime environments
MLFlow Models: deploy and pipeline your models
MLFlow Model Registry: model versioning

This article deals with MLflow Tracking, an experiment management feature.

MLflow Tracking

MLflow experiment management consists of the following three components.

Run
A single trial (e.g., experiment, study)
Experiment
A group that binds the Runs together.
Artifact
Storage of output or intermediate products from a Run

MLFlow architecture

Let's actually use MLflow Tracking. First, install the library.

$ pip install mlflow

Then save the following code in main.py and run it.

main.py

import os
from random import random, randint
from mlflow import log_metric, log_param, log_artifact, log_artifacts

if __name__ == "__main__":
    # Log a parameter (key-value pair)
    log_param("param1", randint(0, 100))

    # Log a metric; metrics can be updated throughout the run
    log_metric("foo", random())
    log_metric("foo", random() + 1)
    log_metric("foo", random() + 2)

    # Log an artifact (output file)
    if not os.path.exists("outputs"):
        os.makedirs("outputs")
    with open("outputs/test.txt", "w") as f:
        f.write("Hello world!")
    log_artifacts("outputs") # Record folder

$ python main.py

A mlruns folder and an outputs folder will be created.

.
├── __init__.py
├── main.py
├── mlruns
│   └── 0
│       ├── 21af48fda35a4aa1b61ef3622f71e4c0
│       │   ├── artifacts
│       │   │   └── test.txt
│       │   ├── meta.yaml
│       │   ├── metrics
│       │   │   └── foo
│       │   ├── params
│       │   │   └── param1
│       │   └── tags
│       │       ├── mlflow.runName
│       │       ├── mlflow.source.git.commit
│       │       ├── mlflow.source.name
│       │       ├── mlflow.source.type
│       │       └── mlflow.user
│       └── meta.yaml
└── outputs
    └── test.txt

0 is the ID of Experiment and 21af48fda35a4aa1b61ef3622f71e4c0 is the ID of Run.

Run the mlflow ui command.

$ mlflow ui

[2023-01-08 15:41:46 +0900] [54928] [INFO] Starting gunicorn 20.1.0
[2023-01-08 15:41:46 +0900] [54928] [INFO] Listening at: http://127.0.0.1:5000 (54928)
[2023-01-08 15:41:46 +0900] [54928] [INFO] Using worker: sync
[2023-01-08 15:41:46 +0900] [54930] [INFO] Booting worker with pid: 54930
[2023-01-08 15:41:47 +0900] [54931] [INFO] Booting worker with pid: 54931
[2023-01-08 15:41:47 +0900] [54932] [INFO] Booting worker with pid: 54932
[2023-01-08 15:41:47 +0900] [54933] [INFO] Booting worker with pid: 54933
[2023-01-08 15:41:55 +0900] [54928] [INFO] Handling signal: winch
[2023-01-08 15:41:58 +0900] [54928] [INFO] Handling signal: winch

You can access http://127.0.0.1:5000 to view the results of the experiment in your browser.

mlflow ui | 1
mlflow ui | 2
mlflow ui | 3

Experiments to be controlled

In MLflow, the following four major values are controlled.

Parameters
Parameters for experiment execution
Tags
Tags for experiment execution
Metrics
Metrics of the experiment
Artifacts
Files generated by the experiment

Logging functions

MLflow can log data to Run using Python, R, Java, or the REST API.

MLflow Tracking Server

You can use the mlflow server command to set up an MLflow Tracking server.

$ mlflow server \
    --backend-store-uri /mnt/persistent-disk \
    --default-artifact-root s3://my-mlflow-bucket/ \
    --host 0.0.0.0

The MLflow Tracking server has two storage-related components

Backend Stores (--backend-store-uri)
Artifact Stores (--default-artifact-root)

Backend Stores

Backend Stores are places to store experiment and execution metadata, execution parameters, metrics, and tags.

The following Backend Stores can be specified in --backend-store-uri.

Local file system
- . /path_to_store
- file:/path_to_store
SQLAlchemy-compatible DB
- <dialect>+<driver>://<username>:<password>@<host>:<port>/<database>

The default value of --backend-store-uri is . /mlruns.

Artifact Stores

Artifact Stores is where Artifacts are stored. Artifact Stores supports the following file systems

Amazon S3
Google Cloud Storage
Azure Blob Storage
FTP server
SFTP Server
NFS
HDFS

--default-artifact-root specifies the default Artifact location.

Automatic logging

MLflow uses a feature called auto-logging to automatically log metrics, parameters, and models without explicitly writing logging code. The following libraries support automatic logging

Scikit-learn
Keras
Gluon
XGBoost
LightGBM
Statsmodels
Spark
Fastai
Pytorch

There are two ways to use automatic logging:

Call the mlflow.autolog() function before the learning code
Call a library-specific function (e.g. mlflow.sklearn.autolog())

The following code is an example of sklearn autologging.

import mlflow

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

mlflow.autolog()

db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

# Create and train models.
rf = RandomForestRegressor(n_estimators = 100, max_depth = 6, max_features = 3)
rf.fit(X_train, y_train)

# Use the model to make predictions on the test dataset.
predictions = rf.predict(X_test)
autolog_run = mlflow.last_active_run()

Automatic Logging

LightGBM experiments

Let's experiment with LightGBM classification of iris datasets.

First, set up a MLflow Tracking server.

lightgbm_experiment.py

$ mlflow server \
  --backend-store-uri ./mlruns \
  --default-artifact-root gs://GCS_BUCKET/mlruns \
  --host 0.0.0.0

Then run the following code twice.

lightgbm_experiment.py

from datetime import datetime
import lightgbm as lgb
from sklearn import datasets
from sklearn.model_selection import train_test_split
import mlflow

params = dict(
    test_size=0.2,
    random_state=42,
)

iris = datasets.load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, **params)

lgb_params = dict(
    learning_rate=0.05,
    n_estimators=500,
)
model = lgb.LGBMClassifier(**lgb_params)

def mlflow_callback():
    def callback(env):
        for name, loss_name, loss_value, _ in env.evaluation_result_list:
            mlflow.log_metric(key=loss_name, value=loss_value, step=env.iteration)
    return callback

mlflow.set_tracking_uri("http://127.0.0.1:5000")
mlflow.set_experiment("EXP-1")

with mlflow.start_run(run_name=str(datetime.now())):
    mlflow.log_params({**params, **lgb_params})
    model.fit(
        X_train,
        y_train,
        eval_set=(X_test, y_test),
        eval_metric=["softmax"],
        callbacks=[
            lgb.early_stopping(10),
            mlflow_callback(),
        ])
    # Log an artifact (output file)
    with open("output.txt", "w") as f:
        f.write("Hello world!")
    mlflow.log_artifact("output.txt")

$ lightgbm_experiment.py
$ lightgbm_experiment.py

When you go to http://127.0.0.1:5000, you will see two Runs in an Experiment called EXP-1.

mlflow ui | 4

References

Lessons from the Development of Stripe Radar - Insights into ML System Development

Kedro

Descriptive Statistics

Differential Equation

Dimensionality Reduction

Discrete Choice Model

Google Search Console

Hugging Face

Hypothesis Testing

Inferential Statistics

Probability Distribution

Ryusei Kakujo

Weave the future of cities through data

Transportation modeling/ Urban planning/ Machine learning/ Computer science/ GIS