2023-01-07

ML model delivery patterns

Model delivery Patterns

Container technology has made it common to run servers using container images of servers. However, the management and versioning of server images and inference model files is an important issue to consider.

There are two main ways to embed models in a server and run it as an inference server.

Model-in-image pattern (a model is contained in an image)
Model-load pattern (a model is loaded from server)

Model-in-image pattern

In the Model-in-image pattern, the model file is included in the image of the reasoner and built. By including the model in the image, it is possible to generate a reasoner dedicated to that model.

Model in image pattern

Pros
- Keep server image and model file versions the same
Cons
- Need to define pipeline from model training to server image building
- Increased inference server image size and longer time to get images loaded and running
Use cases
- When you want to match the server image and inference model versions

Model-load pattern

In the Model-load pattern, the inference server is started, then the model files are loaded, and the inference server is put into production. The server image and inference model files are managed separately.

ML model delivery patterns

Model delivery Patterns

Model-in-image pattern

Model-load pattern

References

Levels of MLOps

ML serving patterns

Ryusei Kakujo