2023-01-07

ML model delivery patterns

Model delivery Patterns

Container technology has made it common to run servers using container images of servers. However, the management and versioning of server images and inference model files is an important issue to consider.

There are two main ways to embed models in a server and run it as an inference server.

  • Model-in-image pattern (a model is contained in an image)
  • Model-load pattern (a model is loaded from server)

Model-in-image pattern

In the Model-in-image pattern, the model file is included in the image of the reasoner and built. By including the model in the image, it is possible to generate a reasoner dedicated to that model.

Model in image pattern

  • Pros
    • Keep server image and model file versions the same
  • Cons
    • Need to define pipeline from model training to server image building
    • Increased inference server image size and longer time to get images loaded and running
  • Use cases
    • When you want to match the server image and inference model versions

Model-load pattern

In the Model-load pattern, the inference server is started, then the model files are loaded, and the inference server is put into production. The server image and inference model files are managed separately.

Model load pattern

  • Pros
    • Can separate server image versions from model file versions
    • Improved server image applicability
    • Server images will be lightweight
  • Cons
    • Server deployment and model file loading are done in sequence, so it may take longer to start the inference server
    • Requires versioning of server image and model files
  • Use cases
    • When the model file version is updated more frequently than the server image version
    • When running multiple types of inference models on the same server image

References

https://github.com/mercari/ml-system-design-pattern/tree/master/Training-patterns

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!