2023-01-07

ML QA patterns

QA patterns for ML systems

There are various ways to perform QA for machine learning systems. In this article, I introduce the following QA patterns for ML systems as described in Mercari's GitHub.

  • Shadow A/B test pattern
  • Online A/B Test Pattern

Shadow A/B test pattern

In the Shadow AB Test Pattern, multiple inference servers are run and the proxy server sends requests to all inference models, but only the current model returns inference results to the client. This makes it possible to test new inference models without affecting the production service. On the other hand, the inference results of the new model are not returned to the client, making it difficult to measure the final business value.

Shadow AB-testing pattern

  • Pros
    • Inference results, speed and availability of new models can be checked in production service
    • Inference results from multiple models can be collected and analyzed
    • Does not affect the production service.
  • Cons
    • Difficult to measure ultimate business value because end-user response is not available
  • Use cases
    • When you want to make sure that the new inference model can be inferred on production data without problems
    • When you want to verify that the new inference server can handle the load of production access

Online A/B Test Pattern

In the Online AB Test Pattern, multiple inference servers are run, with the majority of accesses allocated to the current server, and accesses gradually flowing to the new server. Adjustment of the amount of access is done by a proxy server. This test pattern has a business impact because the new model is connected to the production system and the inference results are returned to the client.

Online AB-testing pattern

  • Pros
    • Inference results, speed and availability of new models can be checked in production service
    • Collect and analyze inference results from multiple models
    • View end-user reactions.
  • Cons
    • New model may have negative business impact
  • Use cases
    • When you want to make sure that the new inference model can be inferred on production data without problems
    • When you want to verify that the new inference server can handle the load of production access
    • When you want to measure the business value of multiple inference models online

References

https://github.com/mercari/ml-system-design-pattern/tree/master/QA-patterns

Ryusei Kakujo

researchgatelinkedingithub

Focusing on data science for mobility

Bench Press 100kg!